Robust feature extraction based on an asymmetric level-dependent auditory filterbank and a subband spectrum enhancement technique
نویسندگان
چکیده
In this paper we introduce a robust feature extractor, dubbed as robust compressive gammachirp filterbank cepstral coefficients (RCGCC), based on an asymmetric and level-dependent compressive gammachirp filterbank and a sigmoid shape weighting rule for the enhancement of speech spectra in the auditory domain. The goal of this work is to improve the robustness of speech recognition systems in additive noise and real-time reverberant environments. As a post processing scheme we employ a short-time feature normalization technique called short-time cepstral mean and scale normalization (STCMSN), which, by adjusting the scale and mean of cepstral features, reduces the difference of cepstra between the training and test environments. For performance evaluation, in the context of speech recognition, of the proposed feature extractor we use the standard noisy AURORA-2 connected digit corpus, the meeting recorder digits (MRDs) subset of the AURORA-5 corpus, and the AURORA-4 LVCSR corpus, which represent additive noise, reverberant acoustic conditions and additive noise as well as different microphone channel conditions, respectively. The ETSI advanced front-end (ETSI-AFE), the recently proposed power normalized cepstral coefficients (PNCC), conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results demonstrate that the proposed method is robust against both additive and reverberant environments. The proposed method provides comparable results to that of the ETSI-AFE and PNCC on the AURORA-2 as well as AURORA-4 corpora and provides considerable improvements with respect to the other feature extractors on the AURORA-5 corpus.
منابع مشابه
Speech Enhancement Perceptual Modification Of
A speech denoising technique based on subband noise estimation and a perceptual modification of Wiener filtering is proposed. The noisy speech is first decomposed into critical band signals by an auditory filterbank and the denoising is carried out on the subband signals. The time varying subband noise variance required for denoising is estimated by tracking the minimum variance of the subband ...
متن کاملRegularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition
In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high var...
متن کاملRobust Asr in Reverberant Environments Using Temporal Cepstrum Smoothing for Speech Enhancement and an Amplitude Modulation Filterbank for Feature Extraction
This paper presents techniques aiming at improving automatic speech recognition (ASR) in single channel scenarios in the context of the REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge. System improvements range from speech enhancement over robust feature extraction to model adaptation and word-based integration of multiple classifiers. The selective temporal cepstrum ...
متن کاملNoise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition
This work presents a noise spectrum estimator based on the Gaussian mixture model (GMM)-based speech presence probability (SPP) for robust speech recognition. Estimated noise spectrum is then used to compute a subband a posteriori signal-to-noise ratio (SNR). A sigmoid shape weighting rule is formed based on this subband a posteriori SNR to enhance the speech spectrum in the auditory domain, wh...
متن کاملVoice biometric feature using Gammatone filterbank and ICA
Voice biometric feature extraction is the core task in developing any speaker identification system. This paper proposes a robust feature extraction technique for the purpose of speaker identification. The technique is based on processing monaural speech signal using human auditory system based Gammatone Filterbank (GTF) and Independent Component Analysis (ICA). The measures used to assess the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Digital Signal Processing
دوره 29 شماره
صفحات -
تاریخ انتشار 2014